This worksheet contains material for an introductory QGIS course held at the Cathie Marsh Institute for Social Research on 26 February 2020. All material is available on GitHub.
The aim of this short course is to get you started using QGIS as a (completely free!) software for creating, manipulating, exploring and visualising spatial data. As we will see, QGIS is a powerful tool for generating aesthetically pleasing maps, but it is also invaluable for conducting analysis and substantive exploratory research. The material we cover today will equip you with the skills necessary to begin visualising and analysing your own data. Once you’re comfortable with the skills introduced in this worksheet, please feel free to put these into practice using your own data, or data downloaded from the data resources section of this worksheet.
You can consider today to be somewhat of a ‘crash course’ in QGIS. As such, it’s worthwhile remembering that the materials we cover today form part of a much wider field commonly known as ‘Geographic Information Science’ (GIS). If you are interested in exploring this field more, there are some recommended books in the further reading at the bottom of this page. That said, the information provided in this course will provide you with more than enough information get started exploring spatial data and making maps.
QGIS is an open-source piece of software. This means, for one thing, that it is free, which represents a significant advantage over comparable software like ArcGIS for students and institutions (understandably) unwilling to fork out for licence fees. But that’s just a practical advantage: open-source also means transparency, continuous development and a supportive community of developers and users. QGIS is a key part of a wider, growing movement towards open-source software in geospatial analysis, with tools like GeoDa and GIS functionality in R becoming increasingly popular. A key benefit of QGIS being open-source is that it is constantly evolving, with lots of smart people continuously contributing to new versions and plug-ins to expand its capability.
QGIS is part of a wider open-soure movement
You will also find a wealth of documentation and resources online, largely generated by the developers and users themselves. Websites like StackExchange are full of people willing to answer your questions, and I guarantee you that most queries you have will have already been answered somewhere! To date, nearly thirsty thousand questions have asked about QGIS. The friendly online community of QGIS developers and users is possibly the most extensive resource out there!
Before we get to grips with the software itself, let’s cover some preliminary basics, beginning with spatial data types. The diversity of topics in geographical research has motivated the collection of an enormous array of information which can be quantified for use in software like QGIS. Data collected for making maps will inherently include some spatial component, describing the location of an entity in space. It might also incorporate attribute data: non-spatial characteristics which describe entities. There are numerous ways in which spatial data can be stored in GIS software, including QGIS, but the most common data types are the vector and raster.
Vector data represent features in the real world through points, lines and polygons. Standing on top of a skyscraper, overlooking a city, one will observe buildings, parks, street lights and roads, each comprising discrete features of the urban landscape. Vector data is comprised of vertices, which define the geometry of these features. The simplest geometric form is a two-dimensional vertex, a single X (longitude) and Y (latitude) coordinate describing a specific point location. When vertices are connected in order, with different start and end points, a line is formed. Lines with equal start and end points, with at least three vertices, represent polygons. In our urban landscape, points might be used to represent street lights, lines to represent roads, and polygons to represent buildings. Of course, a great deal of spatial data defined objects which do not physically exist on the ground, such as electoral wards or neighbourhood boundaries. So, these vertices collectively describe objects in space, and the attributes describe these objects. Given its popularity in social science research, we will focus on vector data throughout today.
Vector data. Source: Data Carpentry via the National Ecological Observatory Network (NEON)
In some circumstances, vector data is unsuitable. Looking down from our skyscraper, one might also observe variation in air pollution across the city. This cannot easily or intuitively be represented using vector geometries such as lines or polygons. Air pollution might vary considerably within streets or parks, and consequently, attribute data associated with lines or polygons would mask a great deal of information. In such circumstances, raster data may be able to represent the real world more accurately than vector data. Rasters are comprised of a regular grid of cells, each of which contain associated attribute data, and can be used to represent continuous spatial information such as air pollution or remote sensing imagery of the Earth’s surface. The most common usage of raster data you might have come across are meteorological maps, such as those used in weather reports. Data about regionwide precipitation or temperature, for instance, is often stored in raster format. As noted earlier, today we are going to focus on vector data, but if you’d like to explore raster data examples, please feel free to explore the raster-specific resources at the end of this worksheet, or give me a shout!
As we noted earlier, maps are representations of the real world. Importantly, these representations tend to be created on a flat surface (a computer screen or piece of paper) even though the earth itself is more-or-less spherical. In an attempt to portray spatial entities, whether it be crime locations or any other phenomena, on a flat surface, we perform a transformation known as a ‘projection’. This is quite the mathematical challenge, and can be carried out in countless different ways, each of which have their own advantages and disadvantages. For instance, until recently, Google Maps used a projection known as the Mercator projection, which whilst useful for navigational purposes, also distorts the earth in a manner which makes land masses near the equator, such as Africa, appear much smaller than they actually are, and land masses near the poles, such as Greenland, much larger. For a light-hearted look at different projections of the world map I recommend this blog post.
Source: Brilliant Maps
When working within GIS software like QGIS, we are subject to the same restrictions, since we are representing real-world information on a flat computer screen. Any spatial information you are using in QGIS, whether it be tram stop locations, neighbourhood boundaries or river formations, must have an associated Coordinate Reference System (CRS). This ensures that we know how our 2D projected maps relate to the actual features on our spherical earth. You are probably vaguely familiar with the most common type of CRS already, known as a Geographic Coordinate Reference System, because it uses latitude and longitude coordinates to define specific points on the earth’s surface. It is more formally known as WGS 84. You might have even noticed that when you select a point in Google Maps, it automatically brings up the latitude and longitude coordinates of that location in a white box at the bottom. It is through this system that we can relate my point click to a real place on earth.
An example of latitude and longitude coordinates on the Google Maps online platform
As we’ll find out later today, not all data you have collected or downloaded will use latitude and longitude coordinates. For example, lots of data released in Britain uses a projected CRS called the British National Grid, which uses Eastings and Northings to define locations in the British Isles based on a grid system, rather than longitude and latitude. In fact, many areas of the world have their own projected CRS. It is beyond the scope of this course (and indeed, many GIS users) to discuss the merits and purposes of different CRS, however, it is important to be aware of the CRS associated with your data, and to treat it appropriately within QGIS. Doing so will ensure that you are displaying information accurately, especially when overlaying multiple data sources. We went through this more practically during the live demonstration, but this will be covered again during the exercises later in this worksheet.
If you want to read more about projections and CRS, you can read the excellent QGIS documentation online, or please feel to ask me!
British National Grid nested grids. Souce: Ordnance Survey
Now we are familiar with some GIS fundamentals, we can move on to opening up QGIS and exploring the interface.